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Abstract 

Classical inequalities used in information theory such as those of de Bruijn, 
Fisher, and KuUback carry over from the setting of probability theory on Eu- 
clidean space to that of unimodular Lie groups. These are groups that posses 
integration measures that are invariant under left and right shifts, which means 
that even in noncommutative cases they share many of the useful features of Eu- 
clidean space. In practical engineering terms the rotation group and Euclidean 
motion group are the unimodular Lie groups of most interest, and the develop- 
ment of information theory applicable to these Lie groups opens up the poten- 
tial to study problems relating to image reconstruction from irregular or random 
projection directions, information gathering in mobile robotics, satellite attitude 
control, and bacterial chemotaxis and information processing. Several definitions 
are extended from the Euclidean case to that of Lie groups including the Fisher 
information matrix, and inequalities analogous to those in classical information 
theory are derived and stated in the form of fifteen small theorems. In all such 
inequalities, addition of random variables is replaced with the group product, and 
the appropriate generalization of convolution of probability densities is employed. 



1 Introduction 



Shannon's brand of information theory is now more than six decades old, and some of 
the statistical methods developed by Fisher, KuUback, etc., are even older. Similarly, 
the study of Lie groups is now more than a century old. Despite their relatively long 
and roughly parallel history, surprisingly few connections appear to have been made 
between these two vast fields. One such connection is in the area of ergodic theory 
[11 [31 [H] , where the Boltzmann-Shannon entropy is replaced with topological entropy 
[Ml H51 [Ml [77] . Ergodic theory developed in parallel with information theory and 
remains an active area of research among mathematicians to the current day (see e.g., 
[53]). Both use concepts of entropy (though these concepts are quite different from each 
other), and some common treatments have been given over the years (see e.g., jlO|). 
However, it should be noted that some of the cornerstones of information theory such 
as the de Bruijn inequality. Fisher information, KuUback-Leibler divergence, etc., do 
not carry over to ergodic theory. And while connections between ergodic theory and 
Lie groups are quite strong, connections between information theory and Lie groups are 
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virtually nonexistent. The goal of this paper is therefore to present a unified framework 
of "information theory on Lie groups." As such, fifteen small theorems are presented 
that involve the structure and/or group operation of Lie groups. Unlike extensions of 
information theory to manifolds, the added structure inherent in Lie groups allow us to 
draw much stronger parallels with inequalities of classical information theory, such as 
those presented in [17] . 

In recent years a number of connections have begun to emerge linking information 
theory, group theory, and geometry. A cross section of that work is reviewed here, and 
it is explained how the results of this paper are distinctly different from prior works. 

In the probability and statistics literature, the statistical properties of random walks 
and limiting distributions on Lie groups has been studied extensively by examining the 
properties of iterated convolutions [28l [35l [38l [53l [611 [62l [68] . The goal in many of these 
works is to determine the form of the limiting distribution, and the speed of conver- 
gence to it. This is a problem closely related to those in information theory. However, to 
the author's knowledge concepts such as entropy, Fisher information, KuUback-Leibler 
divergence, etc., are not used significantly in those analysis. Rather, techniques of har- 
monic analysis (Fourier analysis) on Lie groups are used, such as the methods described 
in[2l[32l[S6l[^[70l[Zll[23[25l[HQ|. Indeed, to the best of the author's knowledge the 
only work that uses the concept and properties of information-theoretic (as opposed to 
topological) entropy on Lie groups is that of Johnson and Suhov [iOl [S] ■ Their goal 
was to use the Kullback-Leibler divergence between probability density functions on 
compact Lie groups to study the convergence to uniformity under iterated convolutions, 
in analogy with what was done by Linnik |51| and Barron [6] in the commutative case. 
The goal of the present paper is complementary: using some of the same tools, many 
of the major defined quantities and inequalities of (differential) information theory are 
extended from R" to the context of unimodular Lie groups, which form a broader class 
of Lie groups than compact ones. 

The goal here is to define and formalize probabilistic and information-theoretic quan- 
tities that are currently arising in scenarios such as robotics [48l [M] [56l [65l [72l [58] 
ITTl [75] , bacterial motion [21 [TJ] , and parts assembly in automated manufacturing sys- 
tems [131 US [m [Sni [Sni ■ The topics of detection, tracking, estimation and control on 
Lie groups has been studied extensively over the past four decades. For example, see 
[Tl[Til[IHl[lSl[ll[n71[21[5Hl[7i[SSl[51[7H] (and references therein). Many of these 
problems involve probability densities on the group of rigid-body motions. However, 
rather than focusing only on rigid-body motions, a general information theory on the 
much broader class of unimodular Lie groups is presented here with little additional 
effort. 

Several other research areas that would initially appear to be related to the present 
work have received intensive interest. For decades, Amari has developed the concept 
of information geometry [5] in which the Fisher information matrix is used to define a 
Riemannian metric tensor on spaces of probability distributions, thereby allowing those 
spaces to be viewed as Riemannian manifolds. This provides a connection between 
information theory and differential geometry. However, in information geometry, the 
probability distributions themselves (such as Gaussian distributions) are defined on a 
Euclidean space, rather than on a Lie group. 

A different kind of connection between information theory and geometry has been 
established in the context of medical imaging and computer vision in which probability 
densities on manifolds are analyzed using information-theoretic techniques [53]. How- 
ever, a manifold generally docs not have an associated group operation, and so there is 
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no natural way to "add" random variables. 

Relatively recently, Yeung and coworkers have used the structure of finite groups 
to derive new inequalities for discrete information. While this heavily involves the 
use of the theory of finite groups, the goal is to derive new inequalities for classical 
information theory, i.e., that which is concerned with discrete information related to 
finite sets. For example, see the work of Chan and Yeung [121 [20] and Zhang and Yueng 
[81] . Li and Chong [49] and Chan [20] have addressed the relationship between group 
homomorphisms and information inequalities using the Ingleton inequality. In these 
works, the groups are discrete, and the new inequalities that are derived pertain to 
classical informational quantities. In contrast, the goal of the current presentation is 
to extend concepts from information theory to the case where variables "live in" a Lie 
group. 

While on the one hand work that connects geometry and information theory exists, 
and on the other hand work that connects finite-group theory and information theory 
exists, very little has been done along the lines of developing information theory on 
Lie groups, which in addition to possessing the structure of differential manifolds, also 
are endowed with group operations. Indeed, it would appear that applications such as 
deconvolution on Lie groups [21] (which can be formulated in an information-theoretic 
context [721 [10]), and the field of Simultaneous Localization and Mapping (or SLAM) 
|72j have preceded the development of formal information inequalities that take advan- 
tage of the Lie-group structure of rigid-body motions. 

This paper attempts to address this deficit with a two-pronged approach: (1) by 
collecting some known results from the functional analysis literature and reinterpreting 
them in information-theoretic terms (e.g. Gross' log-Sobolev inequality on Lie groups); 
(2) by defining information-theoretic quantities such as entropy, covariance and Fisher 
information matrix, and deriving inequalities involving these quantities that parallels 
those in classical information theory. 

The remainder of this paper is structured as follows: Section [2] provides a brief 
review of the theory of unimodular Lie groups and gives several concrete examples (the 
rotation group, Euclidean motion group, Heisenberg group, and special linear group). 
An important distinction between information theory on manifolds and that on Lie 
groups is that the existence of the group operation in the latter case plays an important 
role. Section [3] defines entropy and relative entropy for unimodular Lie groups and 
proves some of their properties under convolution and marginalization over subgroups 
and coset spaces. The concept of the Fisher information matrix for probability densities 
on unimodular Lie groups is defined in Section [4] and several elementary properties 
are proven. This generalized concept of Fisher information is used in Section [5] to 
establish the dc Bruijn inequality for unimodular Lie groups. Finally, these definitions 
and properties are combined with recent results by others on log-Sobolev inequalities in 
Section [O] 

2 A Brief Review of Unimodular Lie Groups 

Rather than starting with formal definitions, examples of unimodular Lie groups are first 
introduced, their common features are enumerated, and then their formal properties are 
enumerated. 
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2.1 An Introduction to Lie Groups via Examples 

Perhaps one reason why there has been little cross-fertilization between the theory of 
Lie groups and information theory is that the presentation styles in these two fields 
are very different. Whereas Lie groups belong to pure mathematics, information theory 
emerged from engineering. Therefore, this section reviews some on the basic properties 
of Lie groups from a concrete engineering perspective. All of the groups considered are 
therefore matrix Lie groups. 

2.1.1 Example 1: The Rotation Group 

Consider the set of 3 x 3 rotation matrices 

5*0(3) = {Re R3^3 I = I, dcti? = +1}. 

Here 5*0(3) denotes the set of special orthogonal 3x3 matrices with real entries. It is 
easy to verify that this set is closed under matrix multiplication and inversion. That 
is, i?,i?i,i?2 e 50(3) =^ i?ii?2,i?"^ e 50(3). Furthermore, the 3 x 3 identity 
matrix is in this set, and the associative law i?i(i?2^3) = {RiR2)R3 holds, as is true 
for matrix multiplication in general. This means that 50(3) is a group, and is called 
the special orthogonal (or rotation) group. Furthermore, it can be reasoned that the 
nine independent entries in a 3 x 3 real matrix are constrained by the orthogonality 
condition RR^ = I to the point where a three-degree-of-frcedom subspace remains. (The 
condition dcti? = +1 docs not further constrain the dimension of this subspace, though 
it does limit the discussion to one component of the space defined by the orthogonality 
condition) . 

It is common to describe the three free degrees of freedom of the rotation group 
using parametrizations such as the ZXZ Euler angles: 

i?(a,/3,7) = i?3(a)i?i(/3)i?3(7) (1) 

where Ri{9) is a counterclockwise rotation about the i*^ coordinate axis. Another 
popular description of 3D rotations or the axis-angle parametrization 

R{i9,n)=l + smi9N+{l-cos^)N^ (2) 

where TV is the unique skew-symmetric matrix such that iVx = n x x for any x G M'^, 
and n is the unit vector pointing along the axis of rotation and x is the vector cross 
product. The "vee and hat" notation 

N"" ^ N = n (3) 

is used to describe this relationship. Here ||n|| = (n-n)^ = 1. It can be parameterized in 
spherical coordinates as n = n{(j), 0), and so a parametrization of the form R = i?(i9, 4>, 0) 
results. The angles i9, 0, 9 are not the same as the Euler angles a, f3, 7. 

The group 50(3) is a compact Lie group, and therefore has finite volume. When 
using Euler angles, volume is computed with respect to the integration measure 

dR = -—r sin a dadpd'y , (4) 
on 

which when integrated over < a,7 < 27r and < /3 < tt gives a value of 1. Indeed, 
this result was obtained by construction by using the normalization of Stt^. The same 
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volume element will take on a different form when using the axis-angle parametrization, 
in analogy with the way that the volume element in M.^ can be expressed in the equivalent 
forms dxdydz and siiiddrd(f)dO in Cartesian and spherical coordinates, respectively. 

Given any 3-parameter description of rotation, the angular velocity of a rigid body 
can be obtained from a rotation matrix. Angular velocity in the body-fixed and space- 
fixed reference frames can be written respectively as 

uJr — Jr (q)q and uJi — J;(q)q 

where q is any parametrization (e.g., q = [ajfijj]"^ or q = 0, 0]-^, where T denotes 
the transpose of a vector or matrix) . 

The Jacobian matrices Jr(q) and J/(q) arc computed from the parametrization i?(q) 
and the definition of the V operation in ^ as 



and 



Jr{q) 



dR 



R 



-R' 



t OR 
dqi 



dR 
dq: 



■R' 



R' 



.dR 
dq2 



R' 



.dR 
dq3 



This gives a hint as to why the subscripts I and r are used: if derivatives with respect 
to parameters appear on the 'right' of R"^ , this is denoted with an r, and if they appear 
on the 'left' then a subscript I is used. 
Explicitly for the Eulcr angles, 



Ji{a,l3,-i) = [e3,i?3(Q;)ei,i?3(Q;)i?i(/3)e3] 




sin a sin (3 
cos a sin (3 
COS/3 



(5) 



and 



Jr = R^Ji = [i?3(-7)^i(-/3)e3,i?3(-7)ei,e3] = 



sin P sin 7 
sin P COS 7 
cos/3 



cos 7 
- sin 7 
1 



(6) 



Note that 

\Ji\ = |J,| =sin/3 

gives the factor that appears in the volume element dR in (jj]). This is not a coincidence. 
For any parametrization of 5*0(3) of the form i?(q), the volume clement can be expressed 
as ^ 

dR = -—-\J{q)\dqidq2dq3 

where J(q) can be taken to be either Jr(q) or J;(q). Though these matrices are not 
equal, their determinants are. 

Whereas the set of all rotations together with matrix multiplication forms a noncom- 
mutative {R1R2 R2R1 in general) Lie group, the set of all angular velocity vectors 
UJr and uji (or more precisely, their corresponding matrices, and w;) together with 
the operations of addition and scalar multiplication form a vector space. Furthermore, 
this vector space is endowed with an additional operation, the cross product uJi x ijJ2 
(or equivalently the matrix commutator [u;i,u;2] = ^1^2 — uJ2<-^i)- This makes the set 
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of all angular velocities a Lie algebra, which is denoted as so(3) (as opposed to the Lie 
group, 50(3)). 

The Lie algebra so(3) consists of skew-symmetric matrices of the form 



X = i X3 -XI =Y,x^X^. (7) 

y —X2 xi y i=i 

The skew-symmetric matrices {Xi} form a basis for the set of all such 3x3 skew- 
symmetric matrices, and the coefficients {xi\ are all real. 

Lie algebras and Lie groups are related in general by the exponential map. For 
matrix Lie groups (which arc the only kind of Lie groups that will be discussed here), 
the exponential map is the matrix exponential function. In this specific case, 

exp : so(3) — > SO{i). 

It is well known (see [24] for derivation and references) that 

^.x _ r , sin l|x|| ^ , (1 ~ cos l|xH) ^2 f.s 
i?(x)_e + X (8) 

where ||x|l = {xf + X2 + x^)^ ■ Indeed, ^ is simply a variation on ^ with x ~ -dn. 

An interesting and useful fact is that except for a set of measure zero, all elements 
of 50(3) can be captured with the parameters within the open ball defined by ||x|| < tt, 
and the matrix logarithm of any group element parameterized in this range is also well 
defined. It is convenient to know that the angle of the rotation, '0{R), is related to the 
exponential parameters as \'&{R)\ = ||x||. Furthermore, 

where 

q/m _i Arace(i?) 
tf(R) = CO" ' 



V 2 

Relatively simple analytical expressions have been derived for the Jacobian J/ and 
its inverse when rotations are parameterized as in ([5]): 

The corresponding Jacobian Jr is calculated as [23] 

1 -cos||x|| ||x|| ~sin||x|| 2 

||x||2 ||x||3 

Note that 

Ji = jj and Ji — RJr- 

The determinants are 

11 / -r M I , / -r M 2(1 — COS ||x||) 

|dct(Ji)| = |det(J,)| = ^ — II „ '' "^ 
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2.1.2 Example 2: The Euclidean Motion Group of the Plane 

The Euclidean motion group of the plane can be thought of as the set of all matrices of 
the form 

(cos 9 — sin 6 x \ 
s\\i9 cos6 y (10) 
1/ 

together with the operation of matrix multiplication. 

It is straightforward to verify that the form of these matrices is closed under mul- 
tiplication and inversion, and that 5(0,0,0) = I, and that it is therefore a group. This 
is often referred to as the special Euclidean group, and is denoted as SE{2). Like 
5*0(3), SE{2) is three dimensional. However, unlike ^(^(S), SE{2) is not compact. 
Nevertheless, it is possible to define a natural integration measure for SE{2) as 

dg = dxdydd. 

And while SE(2) does not have finite volume (and so there is no single natural normal- 
ization constant such as Stt^ in the case of 5*0(3)), this integration measure nevertheless 
can be used to compute probabilities from probability densities. 
Note that 

9{x, y, 0) = exp(xXi yX2) exp(6'X3) 

where 












( 









• 





• 










: 













V 











These matrices form a basis for the Lie algebra, se(2). It is convenient to identify these 
with the natural basis for 'E? by defining {XiY = e^. In so doing, any element of se(2) 
can be identified with a vector in M'^. 

The Jacobians for this parametrization are then of the form 



Ji 



dx ^ 



and 



Jr 



y dx 



, (.- 




Note that 



1. 



|det(J,)| = |det(Jr)| 

This parametrization is not unique, though it is probably the most well-known one. 
As an alternative, consider the exponential parametrization cxp : se(2) —^ SE{2): 

g{xi,X2,X3) = exp{xiXi + X2X2 + X3X3) 





exp 



X3 






— sin 2:3 [x2(— 1 + cosxa) + sinxsj/xs 
cos 3:3 [a;i(l — cosxa) + a;2 sinxsj/xs 
1 



(11) 
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Comparing this with (jlOp it is clear that ^3 = 9, but x ^ xi and y ^ xi. 
The Jacobians in this exponential paramctrization are 



I sin 3^3 cos — 1 

3=3 X3 

1— COS x-j sin X3 



3^:3 

X3Xi—X2-\-X2 COS a:3 — xi sin xs xi-\-X3X2 — xi cos 3:3 —X2 sin X3 

2 2 

V ^3 ^3 

sin X3 1 — cos X3 

a:^3 3^3 

COS 3:3 — 1 sin 3^3 

X3 X3 

X3Xi-\-X2—X2 COS 3:3 — a: 1 sm 3:3 —xi-\-X3X2-\-xi cos 3:3 —3^2 sin 3:3 
5 2 



It follows that 

|dct(JOH|dct(J,)| = ^t^£M. 



2.1.3 Example 3: The Heisenberg Group 

The Heisenberg group, H{1), is defined by elements of the form 

/ 1 a p\ 

g(a,/3,7)= 1 7 where a,P,jeR (12) 

V 1 y 

and the operation of matrix multiplication. Therefore, the group law can be viewed in 
terms of parameters as 

5(Q;i,/3i,7i)5(a2,^2,72) = 5(ai + a2,/3i + 1^2 + Q!ia2,7i +72)- 

The identity element is the identity matrix (/(O, 0, 0), and the inverse of an arbitrary 
element (7(0;, /3, 7) is 

5~^(a,/?,7) = g{-a,aj - (3, -7). 
Basis elements for the Lie algebra are 

/ 1 \ / 

Xi= ; X2 = 0|; X3=l0 0l|. (13) 






The Lie bracket, = XiXj — XjXi, for these basis elements gives 

[^1 , X2] = [X2 , X3] = and [Xi , X3] = X2 . 

If the inner product for the Lie algebra spanned by these basis elements is defined as 
{X,Y) = tr(XF-^), then this basis is orthonormal: {Xi,Xj) = 6ij. 

The group H{1) is nilpotent because {xiXi + X2X2 + 2:3X3)" — for all n > 3. As 
a result, the matrix exponential is a polynomial in the coordinates {xi}: 

(0 Xi X2 \ ^ 
X3 = g{xi,X2 + -xiX3,X3). (14) 
/ ^ 

The paramctrization in (|12p can be viewed as the following product of exponentials: 

gia, P, 7) = 5(0, (3, 0)g(0, 0, 7)3(0, 0, 0) = exp(/3X2) exp(7X3) exp{aE,). 
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The logarithm is obtained by solving for each function of a,f3,"/. By inspec- 

tion this is xi = a, X3 = "f and X2 = (3 — a"//2. Therefore, 

/ a (3~a-f/2 
logg{a,f3,-f) =00 7 
\ 

The Jacobian matrices for this group can be computed in either parametrization. In 
terms of a, 7, 



1 











1 


—a 








1 




J,(a,/3,7)= 1 -a and J,(a,/?,7)= -7 1 . (15) 



In terms of exponential coordinates, 

/ 1 \ / 1 

J,(x) = X3/2 1 -xi/2 and Jz(x) = -2:3/2 1 xi/2 | . (16) 

\ 1 / \ 1 

In both parametrizations 

IdetJrl = |detJ/| = 1. 
2.1.4 Example 4: The Special Linear Group 

The group SL{2,M.) consists of all 2 x 2 matrices with real entries with determinant 
equal to imity. In other words, for a,b,c,d ^M. elements of SL{2, R) are of the form 

0=1'^ ^,1 where ad ~ be — 1. 
\c d J 

Subgroups of SL{2,M.) include matrices of the form 
gi {x) = exp 

52 (y) = exp 

93{S) = exp ^ ^ 
A basis for the Lie algebra sZ(2,]R) is 



X 





- 


-X 




y 


(0 





-d 



) 


is 


















il 




cos 6 


— sin 


sin^^ 


cos 6 



^1 J ' ^ \ Q -1 J ' \ I 

An inner product can be defined in which this basis is orthonormal. 

It can be shown that any g G SL(2, K) can be expressed as a product of gi{x), 52(2/), 
and gz{0). This is called an Iwasawa decomposition of 15*^(2, R). 

The above gi are not the only subgroups of S'L(2,R) For example, exponentiating 
matrices of the form ^ • {X^ + 2X2) results in a subgroup of matrices of the form 



5(0 



cosh ^ sinh ^ 
sinh ^ cosh ^ 
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The Iwasawa decomposition allows one to write an arbitrary g G SL(2,'R.) in the 
form HO] 

= 5i(%2(t).93(e) 



where 



^1(6*) = exp(6lXi) = 



cos tl 


— smf 


sind 


cos B 






\ 





U2{t) = exp{tX2) 

"3(0-exp(|(X3-Xi))= J \ 
In this parametrization the right Jacobian is 

^ \ -1 1 

The left Jacobian is 



, 2 

Ji(e,t,Cs = -\ 2cos26' 2sin26' 
2 I _g2t _e2tsin26l 62*008 26* 



It is easy to verify that 

\det{Jr{6,t,0)\ = |dct(Jz(0,i,O)| = ^e^*. 

Hence, S'L(2,M) is unimodular (which means the determinants of the left and right 
Jacobians are the same). 



2.2 Generalizations 

Whereas several low-dimensional examples of Lie groups were presented to make the 
discussion concrete, a vast variety of different kinds of Lie groups exist. For example, 
the same constraints that were used to define SO{3) relative to M.^^^ can be used to 
define SO{n) from R"^". The result is a Lie group of dimension n{n — l)/2 and has 
a natural volume element dR. Similarly, the Euclidean motion group generalizes as all 
(n + 1) X (71 + 1) matrices of the form 

resulting in SE{n) having dimension n{n + 1)/2 and natural volume element dg = dRdt 
where t G K" and dt = dtidt2 ■ ■ ■ dtn is the natural integration measure for R". The 
following subsections briefly review the general theory of Lie groups that will be relevant 
when defining information-theoretic inequalities. 
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2.2.1 Exponential, Logarithm, and Vee Operation 

In general an n-dimensional real matrix Lie algebra is defined by a basis consisting of 
real matrices {Xi} for i = I, ...,n that is closed under the matrix commutator. That 
is, = ^ij^k for some real numbers {C^^}, which are called the structure 

constants of the Lie algebra. 

In a neighborhood around the identity of the corresponding Lie group, the parametriza- 
tion 

n 

g{xi, ...,Xn) = X where X = '^^XiXi (18) 

i=i 

is always valid in a region around the identity in the corresponding Lie group. And 
in fact, for the examples discussed, this parametrization is good over almost the whole 
group, with the exception of a set of measure zero. 
The logarithm map 

logg(x)=X 

(which is the inverse of the exponential) is valid except on this set of measure zero. It 
will be convenient in the analysis to follow to identify a vector x e R" as 

X = (log gY where {X,Y - e,. (19) 

Here {8^} is the natural basis for R". 

In terms of quantities that have been defined in the examples, the adjoint matrices 
Ad and ad are the following matrix-valued functions: 

Ad{g) = JiJ-^ and ad{X) = log Ad{e^ ) . (20) 

The dimensions of these square matrices is the same as the dimension of the Lie group, 
which can be very different than the dimensions of the matrices that arc used to represent 
the elements of the group. The function A(g) = dct Ad{g) is called the modular function 
of G. For a unimodular Lie group, A((7) = 1. 



2.2.2 Integration and DifTerentiation on Unimodular Lie Groups 

Unimodular Lie groups are defined by the fact that their integration measures are 
invariant under shifts and inversions. In any parametrization, this measure (or the 
corresponding volume element) can be expressed as in the examples by first computing 
a left or right Jacobian matrix and then setting dg = \J {c\)\dq\dqi ■ ■ ■ dqn where n is the 
dimension of the group. In the special case when q = x is the exponential coordinates, 
then [37] 

X/(.)<i»^/^/(.-)dct(i-£^).. 

where x = and dx = dxidx2 • ■ -dxn- In the above expression it makes sense to 
write the division of one matrix by another because the involved matrices commute. 
The symbol Q is used to denote the Lie algebra corresponding to G. In practice the 
integral is performed over a subset of Q , which is equivalent to defining f{e^) to be zero 
over some portion of Q. 

Let f{g) be a probability density function (or pdf for short) on a Lie group G. Then 

/ f{g)dg = l and f{g)>Q. 
Jg 
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It can be shown that unimodularity imphes the following equalities for arbitrary h G G, 
which generally do not all hold simultaneously for measures on nommimodular Lie 
groups: 

fig-')d9^ I f{hog)dg^ ( f{goh)dg= ( f{g)dg. (21) 



Many different kinds of unimodular Lie groups exist. For example, 5*0(3) is compact 
and therefore has finite volume; SE{2) belongs to a class of Lie groups that are called 
solvable, H{1) belongs to a class called nilpotent; and SL{2,M) belongs to a class called 
semisimple. Each of these classes of Lie groups has been studied extensively. But for 
the purpose of this discussion, it is sufficient treat them all within the larger class of 
unimodular Lie groups. 

Given a function /(g), the left and right Lie derivatives are defined with respect to 
any basis element of the Lie algebra Xi & Q as 



t=o 



and Xlf{g)= {lf{exp{-tXi)og) 



(22) 

The use of / and r mimicks the way that the subscripts were used in the Jacobians 
Ji and Jr in the sense that if exp(tXi) appears on the left/right then the corresponding 
derivative is given anl/r designation. This notation, while not standard in the math- 
ematics literature, is useful in computations because when evaluating left/right Lie 
derivatives in coordinates g = .9(q), the left/right Jacobians enter in the computation 
as [H] 

X"/ = [J,.(q)]-^Vq/ and X'/ = -[,/,(q)]-^Vq/ (23) 

where X"^ = [Xl,...,X^f, X' = [X[, X^^f , and Vq = [d/dqi, ...,d/dqnf is the 
gradient operator treating q like Cartesian coordinates. 



2.3 Probability Theory and Harmonic Analysis on Unimodular 
Lie Groups 

Given two probability density functions /i (g) and /2 {g) , their convolution is 

(/i*/2)(g)- / h{h)f2{h-^og)dh. (24) 

JG 

Here /i € G is a dummy variable of integration. Convolution inherits associativity from 
the group operation, but since in general gi o g2 ^ g2 o gi, (/i * f2){g) ^ {h * f\){g)- 

For a unimodular Lie group, the convolution integral of the form in ([M)) can be 
written in the following equivalent ways: 

(/i*/2)(5) = / /i(z-i)/2(^o5)dz 

JG 

= I h{gok-^)f2{k)dk (25) 

JG 

where the substitutions z = and k = o g have been made, and the invariance 
of integration under shifts and inversions in (|21[) is used. 

A powerful generalization of classical Fourier analysis exists. It is built on families 
of unitary matrix-valued functions of group-valued argument that are parametrized by 
values A drawn from a set G and satisfy the homomorphism property: 

U{giog2,X) = Uigi,X)U{g2,X)- (26) 
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Using * to denote the Hermitian conjugate, it follows that 

I = Uie, A) = C/(.g-i o g, A) = Uig-\X)U{g, A), 

and so 

t/(5"\A) = ((7(.g,A))-i = t/*(.9,A). 

In this generalized Fourier analysis (called noncommutative harmonic analysis) each 
U(g, A) is constructed to be irreducible in the sense that it is not possible to simulta- 
neously block-diagonalize U{g, A) by the same similarity transformation for all values 
of g in the group. Such a matrix function U{g,X) is called an irreducible unitary rep- 
resentation. Completeness of a set of representations means that every (reducible) 
representation can be decomposed into a direct sum of the representations in the set. 

Once a complete set of lURs is known for a unimodular Lie group, the Fourier 
transform of a function on that group can be defined as 

/(A)= / fig)U{g-\X)dg. 

JG 

Here A (which can be thought of as frequency) indexes the complete set of all lURs. An 
inversion formula can be used to recover the original function from all of the Fourier 
transforms as 

/(5)= / trace[/(A)C/(g,A)]d(A). (27) 



G 

The integration measure d{X) on the dual (frequency) space G is very different from 
one group to another. In the case of a compact Lie group, G is discrete, and the 
resulting inversion formula is a series, much like the classical Fourier series for 27r- 
periodic functions. 

A convolution theorem follows from (l26l) as 



(/i*/2)(A) = /2(A)A(A) 
and so does the Parseval/Planchcrcl formula: 

\fig)\'dg^ / ||/(A)||MA). (28) 

G JG 

Here || • || is the Hilbert-Schmidt (Frobcnius) norm, and d{X) is the dimension of the 
matrix U{g, A). 

A useful definition is 

u{X,,X) = j^{U{cMtXi)A))\t=o- 

Explicit expressions for U{g, A) and u[Xi^ A) using the exponential map and correspond- 
ing parameterizations for the groups S'0(3), SE{2) and S'£'(3) are given in [581132] . 

As a consequence of these definitions, it can be shown that the following operational 
properties result [24] : 

if/ = u{X,, A)/(A) and Xlf = - f{X)u{X,,X). 

This is very useful in probability problems because a diffusion equation with drift of 
the form 

= - ^ K{t) Xlp{g; t) + lj2 D^JX:x; p{g; t) (29) 

i—1 i^j — l 
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(where D = [Dij] is symmetric and positive semidefinite and given initial conditions 
0) = ^id)) can be solved in the dual space G, and then the inversion formula can 
convert it back. Explicitly, 

p{g;t)= / trace[exp(tB(A))C/(g, A)]d(A) (30) 
Jg 

where 

n n 
kj=l 1=1 

The solution to this sort of diffusion equation is important as a generalization of the 
concept of a Gaussian distribution. It has been studied extensively in the case of 
G = SE{3) in the context of polymer statistical mechanics and robotic manipulators [22l 
[23l[82]. As will be shown shortly, some of the classical information-theoretic inequalities 
that follow from the Gaussian distribution can be computed using the above analysis. 

3 Properties of Entropy and Relative Entropy on 
Groups 

As defined earlier, the entropy of a pdf on a unimodular Lie group is 

Sif) = - [ f (g) log fig)dg. 
Jg 

For example, the entropy of a Gaussian distribution with covariance E is 

S{p{g;t))^log{{2ner/'\m\h (31) 

where log = logg. 

The KuUback-Leibler distance between the pdfs fi{g) and /2(g) on a Lie group G 
naturally generalizes from its form in M" as 

DKL{fi\\f2) ^ jji{9)\og dg. (32) 

As with the case of pdfs in M", -Dxl(/i||/2) > with equality when DklHWI) ~ 0. 
And if DKL{fi\\f2) — then fi{g) = /2(.9) at "almost all" values of g G G (or, in 
probability terminology "/i(.g) — f2{g) almost surely". That is, they must be the same 
up to a set of measure zero. 

Something that is not true in R" that holds for a compact Lie group is that the 
limiting distribution is the number one. If /2(.g) = 1 is the limiting distribution, then 

DKL{fl\\l) = -S{fi). 

3.1 Convolutions Generally Increase Entropy 

Theorem 3.1: Given pdfs fi{g) and /2(g) on the unimodular Lie group G, 

5(/i*/2) >max{5(/i),5(/2)}. (33) 
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Proof: Denote the result of an n-fold convolution on G as 

hAg) = (/l * /2 * /a * • • • * In){9)- 
Recall that a single pairwise convolution is computed as 



The 71-fold convolution can be computed by performing a series of pairwise convolutions 
and stringing them together using the associative law. Convolution of functions on the 
group inherits associativity from the group law, which is reflected in the notation 

/i,»+2(5) = ifi * fi+i * f'i+2){g) = {.h * .fi+ut+2)ig) = {fia+i * fi+2){g) 

where 

[h* h+i,i+-2){g) = {h*{h+i*.fi+2)){g) and h+2){g) = {{M.fi+i)*fi+2)ig)- 

Johnson and Suhov [IDIIIT] proved the following result for compact Lie groups: 

DKLifl,nU) ~ DKL{fl,n-l\\l) = - f D KL{h.n-l\\R{h) fl.n) fn{h) dh (34) 

JG 

where {R{h)f){g) ~ f{goh) is the right shift operator. Since the integrand on the right 
side of (|34p is nonnegative at all values of h (and in fact, strictly positive unless all fi{g) 
are delta functions), this indicates that 

^A'L(/l,„||l)<i?A'L(/l,„-l||l) 

with equality only holding in pathological cases. And so iterated convolutions lead to 
lim i?A'L(/i,«||l) = =^ /i,„(.g) = l a.s. 

n — >oc 

A noncompact group can not have f{g) = 1 as a limiting distribution, and so 
it does not make sense in this case to use the notation £'7i'L(/i.n||l)- Nevertheless, 
essentially the same proof that gives p4|) can be used in the more general case of not- 
necessarily-compact unimodular Lie groups to show that entropy must increase as a 
result of convolution. This can be observed by first expanding out 5(/i^„) as: 



Sifi.n) = - / /i,„(5)log/i,„(5)d.9 (35) 

(/l,n-l * fn)ig) \0gfl,n{g)dg (36) 

fi.n-i{goh-'^)fn{h)dh \ogfi,n{g)dg (37) 

/ / h,n-i{goh-^)Uh)\oghA9)dgdh (38) 

JG JG 

h,n-i{.k)Uh) log/i,„(fc o h) dk dh . (39) 



G JG 
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In going from ([57)1 to all that was done was to reverse the order of integration 
(i.e., using Fubini's Theorem) and in going from p8|) to ((39)) the change of variables 
k = g o is used together with the invariance of integration under shifts. 
Next, observe that 



S{fi,n-i) = - / /i,«-i(fc)log/i,„-i(fc)dA; 
G 

'^/l,n-l(fc)l0g/l,„-l(fc)dfc^ (^J fn{h)dh^ 
/l,„-l(fc) l0g/i,„_i(fc)dfc^ Uh)dh . 



G \JG 

and so 

^(/l,n) - ^(/l,n-l) = / f / /l,n-l(fc) [log /l,„_ 1 (fc) - log/i,„(fc O h)] dk ] U{h)dh 

/l,n-l(fc) 



G 



/l,„_l(fc) log 



G lfi,n{koh) 

DKLifl,n-l\\R{h)h,n)fnih) dh 



dk f„{h)dh 



> 0. 



Since no direct comparison between /i „ and the uniform distribution is made. Johnson 
and Suhov's proof of that has been adapted above yields 



5(/i,„-i*/„)>5(/i,„_i). 
Essentially the same proof can be used to show that 

S{fl * h.n) > 5(/2,„). 

In other words, convolution in cither order increases entropy. 

3.2 Entropy Inequalities from Jensen's Inequality 

Jensen's inequality is a fundamental tool that is often used in deriving information- 
theoretic inequalities, as well as inequalities in the field of convex geometry. In the 
context of Lie groups, Jensen's inequality can be written as 

$ (^^ H9)p{g)dg^ < 1^ Hc^{g))p{g)dg (40) 

where $ : R>o — > M is a convex function on the half infinite line, p{g) is a pdf, and 4>{g) 
is another nonnegativc measurable function on G. 

Two important examples of (f>(x) arc $1(2;) = — logx and $2(2;) — +a;logx. If G is 
compact, any constant function on G is measurable. Letting 0(g) = 1 and $(a;) = $2(2;) 
then gives < ~S{f) for a pdf f{g). In contrast, for any unimodular Lie group, letting 
pig) = fig), 4>ig) = \fig)T and $(x) = <^i{x) gives 

-\og( I [f{gt+'^dg\<aSif). (41) 
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This leads to the following theorem. 

Theorem 3.2: Let ||/(A)|| denote the Frobenius norm and ||/(A)||2 denote the induced 
2-norm of the Fom'icr transform of f{g) and define 



D2{f) = - log\\f{X)\\idiX),D{f) = - log||/(A)||MA),i?(/) = -log / \\.nX)rd{X). 

JG JG JG 

(42) 

Then 

Sif)>Dif) and D{f)<D2{f) (43) 

and 

D2{fi * f2) > D2{fi) + D2{f2) and D{h * h) > D{h) + D{h). (44) 
Furthermore, denote the unit Heaviside step function on the real line as u{x) and let 

J^u[\\f{X)\\)d{X). Then D{f) + log B < D{f)/B. (45) 

For finite groups B = 1 for functions that have full spectrum, and for bandlimited 
expansions on other groups B is finite. 

Proof: Substituting a = 1 into and using the Plancherel formula (P5|) yields 



Sif) > - log l^JJfig)Vdg j = - log (^y^ \\f{X)rd{X) j = D{f). 

The fact that — logx is a decreasing function and ||A||2 < \\A\\ for all A G C"^" gives 
the second inequality in (|43)) . 

The convolution theorem together with the facts that both norms are submultiplica- 
tive, — log(x) is a decreasing function, and the log of the product is the sum of the logs 
gives 

D{fi*f2) = - i log|IiW2(A)f d(A) = - / log||A(A)/2(A)f d(A) > D{h) + D{h). 

JG JG 

An identical calculation follows for D2. The statement in P5|) follows from the Plancherel 
formula ([28]) and using Jensen's inequality (|40)) in the dual space G rather than on G: 



* (^y Jl0(A)||p(A)d(A)j < y $(||0(A)||)p(A)d(A) where J p{X)d{X) = 1 and p(A) > 0. 

(46) 

Recognizing that when B is finite p{X) = u ^||/(A)||^ / B becomes a probability measure 
on this dual space, it follows that 

D{f) = -log^||/(A)fd(A)) =-log(^By|^||/(A)fp(A)d(A)) 
< -logB- j^\og{\\f{X)f)p{X)d{X) = -\ogB + D{f)/B. 

This completes the proof. 

Properties of dispersion measures similar to D{f) and -D2(/) were studied in [55] . 
but no connections to entropy were provided previously. By definition, bandlimited 
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expansions have B finite. On the other hand, it is a classical result that for a finite 
group, r, the Plancherel formula is (see, for example, [24]): 

Ei/(7)p = ^Ed?iiAf 

7er ' ' fe=i 

where a is the number of conjugacy classes of T and dk is the dimension of fk- And by 
Burnsidc's formula J2k=i '^fc ~ 1^1 follows that _B = 1 when all ^ 0. 

3.3 The Entropy Produced by Convolution on a Finite Group 
is Bounded 

Let r be a finite group with |r| elements {gi, 5|r| }j a-nd let p^{gi) > with X]l=i idi) ~ 
1 define a probability density /distribution on F. In analogy with how convolution and 
entropy are defined on a Lie group, G, they can also be defined on a finite group, F 
by using the Dirac delta function for G, denoted here as S{g). If F < G (i.e., if F is a 
subgroup of G), then letting 

|r| 

i=i jer 

can be used to define a pdf on G that is equivalent to a pdf on F in the sense that if 
the convolution of two pdfs on F is 

|r| 

{Pl * P2 = E P'ii9j)pl{9j' o g,) (47) 

then 

(pf * P^M = E('^i * P2 )(7)^(7-' o g). (48) 

Given a finite group, F. let 

|r| 

Sip) = -^pig^)^ogp{g,) = -^p(7)logp(7). 

i=l 7Gr 

Unlike the case of differential/continuous entropy on a Lie group, < S{p). 

The following theorem describes how the discrete entropy of pdfs on F behaves under 
convolution. Since only finite groups are addressed, the superscript F on the discrete 
values p{gi) are dropped. 

Theorem 3.3: The entropy of the convolution of two pdfs on a finite group is greater 
than cither of the entropies of the convolved pdfs and is no greater than the sum of 
their individual entropies 

max{^(pi), S{p2)} < S{p, * p2) < S{p,) + S(p2). (49) 

Proof: The lower bound follows in the same way as the proof given for Theorem *.l 
with summation in place of integration. The entropy of convolved distributions on a 
finite group can be bounded from above in the following way. 
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Since the convolution sum contains products of all pairs, and each product is positive, 
it follows that 

Pi{9k)p2{gk^ ° 9i) < {Pi *P2){gi) 

for all k G {1, |r|}. Therefore, since log is a strictly increasing function, it follows 
that 

|r| /|r| \ 

-S{pi *P2) > X! '^Pii9])P2igJ^ ° gt) log(pi(5fc)p2(gfc^ offj)) ■ 
i=l \j=l J 

Since this is true for all values of fc, we can bring the log term inside of the summation 
sign and choose k — j. Then multiplying by —1, and using the properties of the log 
function, we get 

|r| |r| |r| |r| 

S{pi*P2) < - ^^Pi{gj)P2{g~^ og, ) log pi {gj )-^^Pi{g])p2 {g~ ^ ogt ) log P2 {g~ ^ ogt ) ■ 

i—1 j — 1 i—1 j — 1 

Rearranging the order of summation signs gives 

|r| / \r\ \ |r| / |r| 

S{pi*p2) < -^pi(5j)logpi(.gj) ^^2(57^ °50 -'^Pi{9j) 51^2(57^ °3j)logP2(g7^ oj 

(50) 

But summation of a function over a group is invariant under shifts. That is, 

|r| |r| 

i=i i=i 7er 7er 

Hence, the terms in parenthesis in (|50p can be written by replacing gj^ og^ with gi gives 
(051). 



3.4 Entropy and Decompositions 

Aside from the ability to sustain the concept of convolution, one of the fundamental ways 
that groups resemble Euclidean space is the way in which they can be decomposed. In 
analogy with the way that an integral over a vector-valued function with argument x G 
R" can be decomposed into integrals over each coordinate, integrals over Lie groups can 
also be decomposed in natural ways. This has implications with regard to inequalities 
involving the entropy of pdfs on Lie groups. Analogous expressions hold for finite groups, 
with volume replaced by the number of group elements. 



3.4.1 Direct Products 

Given the direct product of two groups, Gi x G2, and a probability density /(.gi,(?2) 
with 

/ / f{9i,92)dgidg2 = 1 

JG JG 

and the corresponding entropy is 

Si2 = - / / (31,52) log/ (gi, 32)^51 £^52- 

JG JG 
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Inexactanalogywithclassicalinformationtheory, wecanwriteSi2 < 5'i + 5*2 (51) 



where 



/i(9i) = / f{9i,92)dg2 and 72(92) = / /(gi, 52)^.91, 

JG JG 



and 

Si = -I fi{gi) log fi{gi)dgi 



G 



Equality in ^ holds if and only if /(,9i,52) = fi{9i)hi.92)- 

As in the case of pdfs on Euclidean space, (|5ip follows from the fact that the 
Kullback-Leibler divergence in ([5^ has the property that DxLif \\ /1/2) > 0. 



3.4.2 Coset Decompositions 

Given a subgroup H < G, and any element g £ G, the left coset gH is defined as 
gH = {g o h\h € H}. Similarly, the right coset Hg is defined as Hg = {h o g\h S H}. 
In the special case when g £ H, the corresponding left and right cosets are equal to 
H. More generally for all g & G, g G gH and giH = g2H if and only if o gi G H. 
Likewise for right cosets Hgi = Hg2 if and only if gi o g^^ & H . Any group is divided 
into disjoint left (right) cosets, and the statement "51 and 172 are in the same left (right) 
coset" is an equivalence relation. 

An important property of gH and Hg is that they have the same number of elements 
as H. Since the group is divided into disjoint cosets, each with the same number 
of elements, it follows that the number of cosets must divide without remainder the 
number of elements in the group. The set of all left(or right) cosets is called the left(or 
right) coset space, and is denoted as G/H (or H\G). For finite groups one writes 
\G/H\ = |-ff\G'| = |G|/|77|. This result is called Lagrange's theorem. Similar expressions 
can be written for Lie groups and Lie subgroups after the appropriate concept of volume 
is introduced. We will use the following well-known fact [57] : 

I f{g)d{g)^ I ([f{goh)d{h)]d{gH) (52) 

JG JG/H \JH / 

where g G gH is taken to be the coset representative. In the special case when f{g) is 
a left-coset function (i.e., a function that is constant on left cosets), ([5^ reduces to 

/ /(ff)d(,9) = / F{gH)d{gH) 

JG JG/H 

where it is assumed that d{h) is normalized so that Yq\[H) = dh = 1, and 

FigH)= f f{goh)dh 

JH 

is the value of the function f{g) on each coset representative (which is the same as that 
which results from averaging over the coset gH) . 

Theorem 3.4: The entropy of a pdf on a unimodular Lie group is no greater than the 
sum of the marginal entropies on a subgroup and the corresponding coset space: 

S{fG)<S{fG/H) + S{fH). (53) 
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Proof: For the moment it will be convenient to denote a function on G as faig) (rather 
than f{g)) and write 

fcig) = fa/HxHig) = .fG/HxH{9H,e). 

That is, a function on G evaluated at g can be equally described as a function on a coset, 
together with a rule for extracting a specific coset representative, which in this case is 
the identity. This means that given gH, g is recovered from g e gH as g o e^^ = g. By 
enforcing the constraint on the definition of /g/hxh that 

fcig oh) = fa/HxHigH, h) and fc/HxHiH, h) fa/HxHiH, e), 

then g can be recovered from g oh ^ gH as g oho h^^ =^ g. Using this construction, we 
can define 



and 



fH{h) = / faig o h)d{gH) = / fa/HxHigH, h)d{gH) 

JG/H JG/H 

fG/H{gH)= / fG{goh)dh^ I fa/HxH{gH,h)dh. 

J H J H 

For example, if G = SE{n) is a Euclidean motion group and H ~ SO{n) is the 
subgroup of pure rotations in n-dimcnsional Euclidean space, then G/H = M" , and we 
can write 



f{9)d{9)= / / f{goh)d{R)\d{t) 

SE(n) J SE(n)/SO[n) \J SO(n) J 

It follows from the classical information-inequality for the entropy of marginal dis- 
tributions obtained by letting F[g) = —f{g) log /(g) and using the nonnegativity of the 
KuUback-Leibler divergence 

DifGig o h) II fa/H ■ fnih)) > 
together with the shift-invariancc of integrals on unimodular Lie groups that (|53p holds. 



3.4.3 Double Coset Decompositions 

Let H < G and K < G. Then for any g e G, the set 



HgK = {hogok\he H,k e K} (54) 

is called the double coset of H and K, and any g G HgK (including g = g) is called 
a representative of the double coset. Though a double coset representative often can 
be described with two or more different pairs (/ii,fci) and (/i2,A:2) so that g = hi o 
g o ki = /i2 ° 5 ° ^2, we only count g once in HgK. Hence \HgK\ < |G|, and in 
general \HgK\ \H\ ■ \K\. In general, the set of all double cosets of H and K is 
denoted H\G/K. Hence we have the hierarchy g G HgK G H\G/K. It can be shown 
that membership in a double coset is an equivalence relation. That is, G is partitioned 
into disjoint double cosets, and for H < G and K < G either HgiK n Hg2K — or 
HgiK = Hg2K. 
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Another interesting thing to note (when certain conditions are met) is the decom- 
position of the integral of a function on a group in terms of two subgroups and a double 
coset space: 

'' F{g)d{g)^ [ [ I F{kogoh)d{h)d{KgH)d{k). (55) 

JK J K\G/H Jh 

A particular example of this is the integral over SO (3), which can be written in 
terms of Eulcr angles as 

27r pTT p27T 2 



SO(3) Jo Jo Jo 



d-9 ~ I I I sin Pdadfidj 



Isc 



SO(2) J SO{2)\SO(3)/SO{2) J SO{2) 



Theorem 3.5: The entropy of a pdf on a group is no greater than the sum of marginal 
entropies over any two subgroups and the corresponding double-coset space: 

5(/g) < 5(/k) + S{fK\G/H) + S{fH). (56) 

Proof: Consistent with (|55p it is possible to decompose a function fcig) as 

faig) = fKxK\G/HxH{e,KgH,e) where faikogoh) = fKxK\G/HxH{k,KgH,h). 

If 



and 



/if(fc) = / / fG{kogoh)dhd{gH) 

J K\G/H J H 

fnih) = [ [ fG{kogoh)d{Kg)dk 

J K J K\G 

fK\G/H ^ / / fGiko g o h)dhdk, 

J K J H 



I K J H 

then letting F{g) = —f{g) log f{g) and using the nonnegativity of the KuUback-Leibler diver- 
gence 

D{fG{k ogoh) II /x(fc) ■ fK\G/H ■ fH{h)) > 

together with the shift-invariance of integrals on unimodular Lie groups gives (|56|l 
3.4.4 Nested Coset Decompositions 

Theorem 3.6: The entropy of a pdf is no greater than the sum of entropies of its marginals 
over coset spaces defined by nested subgroups: 

S{fG) < SUg/k) + S{fK/H) + Sifn). (57) 

Proof: Given a subgroup K of H, which is itself a subgroup of G (that is, H < K < G), it is 
possible to write [37] 



F{gH)d{gH) = 

G/H JG/K 



f F{g o kH)d{kH) 
Jk/h 



d{gK). 



Therefore, 



/ F{g)dg= Iff F(gokoh)dhd(kH)d{gK). 
Jg Jg/k jk/h Jh 
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Again letting F{g) = — /g(p) log /g(5), it follows from the properties of KuUback-Leibler di- 
vergence and the unimodularity of G that if 

fG/K{gK)= [ [ f(gokoh)dhdikH) 

J K/H J H 

fK/H{kH)= [ f f{gokoh)dhd{gK) 
Jg/k J h 

and 

fH{h)= f I f{gokoh)d{kH)d{gK) 
Jg/k Jk/h 

then (IS7|) follows. 

3.4.5 Class Functions and Normal Subgroups 

In analogy with the way a coset is defined, the conjugate of a subgroup H for a given g £ G 
is defined as gHg~^ = {g o ho g~^\h £ H}. Recall that a subgroup N < G is called normal if 
and only if gNg^^ C A'' for all g £ G. This is equivalent to the conditions g^^Ng C N, and so 
we also write gNg^^ — N and gN — Ng for all g £ G. 

A function, x{g)i that is constant on each class has the property that 

X{g) = Xih"^ o g o h) or x{h o g) = xio ° h) (58) 

for any g,h £ G. Though convolution of functions on a noncommutative group is generally 
noncommutative, the special nature of class functions means that 

Lf*X){g) = f f{h)x{h-'og)dh^ f f{h)x{goh-^)dh 
Jg Jg 

= [ x{k)fik'' og)dk = {x*f){g). 
Jg 

where the change of variables k = g o is used together with the unimodularity of G. 

3.5 When Inequivalent Convolutions Produce Equal Entropy 

In general {pi*p2){g) / (P2*pi)(g). Even so, it can be the case that S{pi*p2){g) — S{p2*pi){g). 
This section addresses several special cases when this equality holds. 

Let G denote a unimodular Lie group and for arbitrary g,g\ £ G define p^(g) = p(g~^), 
Lgip{g) = p{gT^ o g), RgiPig) = p{g ° gi), Cg^p{g) = p{g^^ ogo gi). Then if p{g) is a pdf, 
it follows immediately from (|2ip that p^{g), Lgj^p{g), Rg^p{g), and Cg^p{g) axe all pdfs. A 
function for which p^ [g) = p{g) is called symmetric, whereas a function for which Cg^p[g) — 
p{g) for all G G is a class function (i.e., it is constant on conjugacy classes). 

Theorem 3.7: For arbitrary pdfs on a unimodular Lie group G and arbitrary gi,g2 £ G, 

Pl * P2 / P2 * Pi 7^ LgiPl * Rg2P2 / Cg^pi * Cg^p2, 

however, entropy satisfies the following equalities 

S{pi *p2) = S{P2 *Pl) = S{Lg^Pl* Rg^p2) = S{Cg^Pl *Cg^P2). (59) 

Proof: Each equality is proven by changing variables and using the unimodularity property 
in dH]). 

{p2 * Pi){g) = / P2{h)pi{h'^ o g)dh^ I p2{h~^)pi{g~^ oh)dh 
Jg Jg 




pi{g ^ok ^)p2{k)dk ^ {pi* p2)ig ^) ^ {pi * P2Y {g)- 
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Let F[p] = — plog/9. Then due to (|21|l . the integral over G of F[p{g~^)] must be the same 
as F[p{g)], proving the first equality in (|59|l . The second equality follows from the fact that 
(Lgjpi * Rg2P2){g) = {pi * p2){gi ° g° 92) and the integral of F[p{gi o 5 o 52)] must be the same 
as F[p{g)], again due to (|2ip . The final equality follows in a similar way from the fact that 
{CgiPi * Cg^p2){g) = (pi * P2){g^^ ogo gi). 

Note that the equalities in (|59|l can be combined. For example, 

S{P1 * P2) = S{Lg^P2 * Rg^pi) = S{Cg^P2 * Cgip]'). 

Theorem 3.8: The equality S{p\ * P2) = •S'(p2 * pi) holds for pdfs p\{g) and P2(fl) on a 
unimodular Lie group G in the following cases: (a) pi{g) for i = lori = 2isa class function; 
(b) pi{g) for i = 1, 2 are both symmetric functions. 

Proof: Statement (a) follows from the fact that if either pi or p2 is a class function, then 
convolutions commute. Statement (b) follows from the first equality in (|59p and the definition 
of a symmetric function. 

Theorem 3.9: Given class functions xi {g) and X2 {g) that are pdfs, then for general g\,g2 £ G, 

(xi *X2){9) / (igiXi *Lg2X2){g) / {RgiXi *Rg2X2){g) / (-RsiXi *i92X2)(5) 

and yet 

•^(xi *X2) = S{Lg^xi * -^92X2) = S{Rg-,xi * -^92X2) = S{Rg^xi *Lg2X2)- (60) 

Proof: 

Here the first and final equality will be proven. The middle one follows in the same way. 

(LgiXi * ■£'92X2)(,g) = {Lg,xi){h) * {Lg2X2){h'^ ° g)dh = Xi{gi^ ° h)x2{g2^ o h'^ o g)dh 
Jg Jg 

= / Xi{k)X2{g2^ ° ° ° 9)dk = / Xi(k)x2{k'^ ° 9i^ o 9 ° 92^)dk 
Jg Jg 

= {xi*X2)igi^ °g°g2^}- 

Similarly, 

(■R91X1 * ■^92X2)(S') = {RgiXi){h) * {Lg2X2)(h~^ ° g)dh = Xi{h o 9i)x2{g2^ ° ° 9)dh 
Jg Jg 

= / Xi{k) * X2{g2^ ° gi ° k^'^ o g)dk = Xi{k) * X2{k'^ ° 9 ° 92^^ ° 9i)dk 
Jg Jg 

= iXi*X2)i9 92 ^ °gi)- 

4 Fisher Information and Diffusions on Lie Groups 

The natural extension of the Fisher information matrix for the case when f{9,0) is a parametric 
distribution on a Lie group is 

In the case when parameterizes G as g(0) = exp(^. 9iXi) and f{g, 0) — /(<7oexp(^j 9iXi)), 
then 

df - Y^'f 
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and Fij (/, 0) becomes 



FlAf)- l^j{Xlf){X^f)dg. (62) 



In a similar way, we can define 



FUf) - j^jiXlf){X'j)dg. (63) 



Theorem 4.1: The matrices (|62|l and (|62|l liave the properties 

F[,{L{h)f)=F[,if) and Fl,{R{h)f) ^ Fl,{f) (64) 

and 

Fl,{I{f)) = FUf) and F^AHf)) ^ Fl,if) (65) 
where {L{h)f){g) = /(/^-^ o p), {R{h)f){g) = /(<? o /i), and J(/)(p) = /(5-I). 

Proof: The operators Xl and i?(/t) commute, and likewise XI and L{h) commute. This 
together with the invariance of integration under shifts proves ([64} . From the definitions of X\ 
and XJ in OH), it follows that 



X:(I{m9)= (^|/([5oexp(tX0]-^) 



= ( ^/(exp(-tXO og ^; 



= (^'/)(5''). 



Using the invariance of integration under shifts then gives (|65p . As a special case, when /(p) 
is a symmetric function, the left and right Fisher information matrices will be the same. 

Note that the entries of Fisher matrices Flj (/) and Flj (/) implicitly depend on the choice 
of orthonormal Lie algebra basis {Xi}, and so it would be more descriptive to use the notation 
Fli{f,X) and4(/,X) . 

If a different orthonormal basis {Yi\ is used, such that Xi — UikYk, then the orthonor- 
mality of both {Xi} and {Yi} forces A — [aij] to be an orthogonal matrix. Furthermore, the 
linearity of the Lie derivative, 

X'^f — ^^XiXif where X — XiXi, 

i i 

means that 

F[j{f,X) = £ i (^a^kYU^ (E«^'^'''/) d5 = E«.fc%iJ'fci(./,n- 

The same holds for F^ . Summarizing these results in matrix form: 

F^{f,X)=AF^{f,Y)A^ and F'{f,X) = AF'if,Y)A'^ where ef Ae, = {X,,Yj). 

(66) 

This means that the eigenvalues of the Fisher information matrix (and therefore its trace) are 
invariant under change of orthonormal basis. 



4.1 Fisher Information and Convolution on Groups 

The decrease of Fisher information as a result of convolution can be studied in much the same 
way as for pdfs on Euclidean space. Two approaches are taken here. First, a straightforward 
application of the Cauchy-Bunyakovsky-Schwarz (CBS) inequality is used together with the 
bi-invariance of the integral over a unimodular Lie group to produce a bound on the Fisher 
information of the convolution of two probability densities. Then, a tighter bound is obtained 
using the concept of conditional expectation in the special case when the pdfs commute under 
convolution. 
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Theorem 4.2: The following inequalities hold for the diagonal entries of the left and right 
Fisher information matrices: 

Fuih * /2) < mm{F^fi),F[,{f2)} and PUfi * /2) < min{i^',(/i), ^^(/a)}. (67) 
Proof: The CBS inequality holds for groups: 

i{g)b{g)dg^ < J^a\g)dg J^b^(g)dg. 



a{ 

G 



If 1(5) ^ for all values of g, then it is possible to define j{g) = [a{g)] 2 and k{g) = [a{g)] 2 &((/), 
and since j{g)k{g) = a{g)b{g), 



a{g)b{g)d?j < (^j j\g)dg^ (^j {t)dg 



a{g)dgji^J^a{g)[big)fdgj. (68) 

Using this version of the CBS inequality, and letting b{g) = XI f2{h~^ o g)/[/2(ft~^ o g)] and 
O'id) = fiih)f2{h^^ ° g), essentially the same manipulations as in [TB] can be used, with the 
roles of fi and /2 interchanged due to the fact that in general for convolution on a Lie group 

(/l*/2)(fl)7^(/2*/2)(ff): 

Jg Ui*h){g) 

- L 



!a[Xlf2{h-' o g)/Mh-' o 3)]^[/2(/i-^ o 9)Mh)]dh) (/^ f2{h-' o g)Mh)dh) 



{h*h){g) 

{[Xlf2{h-'og)f/f2{h-'og)}f,{h)dh)dg 



{[Xlf2{h~'og)f/f2{h~'og)}dgj h{h)dh 

= F^{f2) I h[h)dh 
Jg 

= F:,{f2) 

Since for a unimodular Lie group it is possible to perform changes of variables and inversion 
of the variable of integration without aff'ecting the value of an integral, the convolution can be 
written in the following equivalent ways, 

(/i*/2)(ff) = [ fi{h)f2{h-' og)dh (69) 
JG 

= f fi(goh-')f2{h)dh (70) 
Jg 

= [ fi(goh)f2{h-')dh (71) 
Jg 

= [ fl{h-')f2{hog)dh (72) 

Jg 

It then follows that using (|70p and the bi-invariance of integration that (|67p holds. 
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4.1.1 A Tighter Bound Using Conditional Expectation for Commuting PDFs 

In this subsection a better inequality is derived. 

Theorem 4.3: The following inequality holds for the right and left Fisher information matri- 
ces; 

tr[r-(pi * P2)P] < tT[F''{p,)P] and tr[F'(pi * p2)P] < tr[F\p,)P] (73) 

where i = 1,2 and P is an arbitrary symmetric positive definite matrix with the same dimen- 
sions as F. 



h2{h,g) = pi{h)p2{h ^ op). 



Proof: Let 

Then 

hih) = / fi2{h, g)dg = pi(h) and f2{g) = / fi2{h, g)dh = {pi * p2)ig). 
Jo Jg 

It follows that 

{X:f2){g)= I pi{h)Xlp2{h-' og)dh. 
Jg 

Then by the change of variables k — h^^ o g, 

{Xlf2Kg) = I pi{gok-')Xlp2{k)dk. 



This means that 

{Xlf2){g) 



{Xlp2)[k) pr{gok-^)p2{k) 



And therefore, 



Mg) 



Fu{f2) = 



P2{k) 



f iXlp2){g) 

\ Mg) 



Mg) 



dk = 



{Xlp2)(k) 



P2{k) 



(74) 



{Xlp2){k) 



{Xlp2){k) 

P2{k) 



P2{k) 

{Xlp2){k) 
P2{k) 



Fu{p2 



An analogous argument using fi2{h, g) = pi(.g o h ^)p2{h) and f2{g) ~ (pi * P2){g) shows 



that 



and 



(^'/2)(g) ^ / {Xlpi){k) 

Mg) \ piik) 



(75) 



fUM < fUpi). 

The above results can be written concisely by introducing an arbitrary positive definite 
diagonal matrix A as follows: 

tr[F''(pi * p2)A] < tr[F''(p2)A] and tr[F'(pi * p2)A] < tr[F'(p2)A]. 



If this is true in one basis, then using (|66|) the more general statement in (|73p must follow in 
another basis where P = P^ > 0. Since the initial choice of basis is arbitrary, (|73|l must hold 
in every basis for an arbitrary positive definite matrix P. This completes the proof. 

In some instances, even though the group is not commutative, the functions pi and p2 will 
commute. For example, if p[g o h) = p(h o g) for all h,g £ G, then (p * pi){g) ~ {pi * p)ig) 
for any reasonable choice of pi{g). Or if p2 — pi * pi * ■ ■ ■ pi it will clearly be the case that 
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pi * p2 = p2 * pi- If, for whatever reason, pi * pi = pi * pi then (|73p can be rewritten in the 
following form: 



tr[F'-(pi*p2)P] < niin{tr[F'-(pi)P],tr[F'-(p2)P]} 
and 

tr[F'(pi*p2)P] < min{tr[F'(pi)P],tr[F'(p2)P]} 



(76) 



Theorem 4.4: When pi* pi — pi* pi the following equality holds 

1 1 1 . r,T 



< 



for any P = > 0, (77) 



tr[F-{pi*p2)P] - tr[F'-(pi)P] tr[F'-(p2)P] 
and likewise for FK 

Proof: Returning to (|74|l and (|75|l . in the case when pi * pi = pi * pi it is possible to write 



{XU2){g) _ I {Xlpi){k) 



and 



/2(5) \ P2(fc) 

(^'/2)(3) _ / {xip^m 



{Xlpi){k) 



Hg) 



Pi{k) 



9 = 



Pi{k) 



{X\p2W) 



(78) 



P2(fc') 



Since the following calculation works the same way for both the '1' and 'r' cases, consider 
only the 'r' case for now. Multiplying the first equality in (|78|) by 1 — /3 and the second by j3 
and adding togetheiQ: 



(^r/2)(g) 

Ho) 



= P 

- (i> 



{Xlpi){k) 



Pi{k) 

{x:pi){k) 

Pi{k) 



+ (!-/?) 



{Xlp2){k') 



+ (!-/?) 



P2{k' 
{Xlpi){k') 



P2{k') 



for arbitrary (3 £ [0, 1]. 

Now squaring both sides and taking the (unconditional) expectation, and using Jensen's 
inequality yields: 



{XUi){g) 
Hg) 



^ (Xlpim ^ {XlpiKk') 

Pi{k) 

Axipi){k) 



Pi{k) 



+ (!-/?) 



P2{k') 

{XlP2){k') 
P2{k') 



This statement simply says 

FUpi * P2) < Fl,{p,) + (1 - pf FUp2). 
The value of /3 G [0, 1] that gives the tightest bound is 

F[.iP2) 



(80) 



F[,{pi)+F[,ipi)' 



^Thc names of the dummy variables k and k' are unimportant. However, at this stage it is important 
that the names be diflferent in order to emphasize their statistical independence. 
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resulting in the inequality 



<1^ + 1^- (81) 



Alternatively, if before computing the optimal /3 we first multiply both sides of (|8Up by Ai and 
sum over i, the result will be 

trfF^pi *p2)A] < /3'tr[F'-(pi)A] + (1 - /3)^r[F'-(p2)A]. 

Again, since the basis is arbitrary, A can be replaced with P. Then the optimal value of /3 will 
give (fff)) . 



4.1.2 A Special Case: 50(3) 

„l ^ J — : „,:4-l, ™; 4- 11 T vr [ vr v'r -i^^ 



Consider the group of 3 x 3 orthogonal matrices with determinant +1. Let X'' = [XI, X2,X^]'^ 



and X' — [X{, Xl, Xlj]''" . These two gradient vectors are related to each other by an adjoint 
matrix, which for this group is a rotation matrix [J^. Therefore, in the case when G = 50(3), 

||X7f = IIX'/II' ^ tr[F^if)] = tr[F'(/)l 

Therefore, the inequalities in (|76|l will hold for pdfs on 50(3) regardless of whether or not the 
functions commute under convolution, but restricted to the condition P = I. 

5 Generalizing the de Bruijn Identity to Lie Groups 

This section generalizes the de Bruijn identity, in which entropy rates are related to Fisher 
information. 

Theorem 5.1: Let fD,h.t{9) ~ f{g,t',D,h) denote the solution to the diffusion equation (|29|l 
with constant h subject to the initial condition f{g, 0; D, h) = 5{g). Then for any well-behaved 
pdf a{g), 

j^S{a * foM.t) = ^tr[DF'-(a * /i3.h,t)]- (82) 



Proof: It is easy to see that the solution of the diffusion equation 

dp 
dt 



= ^ E D^,XlX^p - E h.Xlp (83) 



i,j = l fc = l 



subject to the initial conditions p{g,0) = a{g) is simply p{g,t) — {a * /o,h,t)(s')- This follows 
because all derivatives "pass through" the convolution integral for p{g,t) and act on fD,h,t{g)- 
Taking the time derivative of S{p{g,t)) we get 



Xp(.,t)logp(.,.)., = -£{f logp+|}rf,. (84) 



Using (|83|l . the partial with respect to time can be replaced with Lie derivatives. But 

X^kPdg = [ X^X'.pdg = 0, 



G ^ 



SO the second term on the right side of (|84p completely disappears. Using the integration-by- 
parts formul4EI 



fiXlf2dg = - / f2Xkfic 

G JG 



^There are no surface terms because, like tlie circle and real line, eacfi coordinate in the integral 
either wraps around or goes to infinity. 
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with /i — log p and f2=p then gives 

at 2 Jg ^* jD,h.t 

1 " 

= itr[DF'-(a*/i,,h,t)]. 
The imphcation of this is that 

5(a * /i3,h,t2) - •S'(a * /i3,h,ti) = / tr [DF' {a* fD,ti,t)]dt 

2 Jti 

6 Information-Theoretic Inequalities from Log-Sobolev 
Inequalities 

In this section information-theoretic identities are derived from Log-Sobolev inequalities. Sub- 
section |57T] provides a brief review of Log-Sobolev inequalities. Subsection 16.21 then uses these 
to write information-theoretic inequalities. 



6.1 Log-Sobolev Inequalities in R"^ and on Lie Groups 

The log-Sobolev inequality can be stated as [71 151 l5Uj: 



|-i/'(x)| log |'i/'(x)|^dx < — log 



2 

Tren 



\\ViPfdx 



(85) 



where 



dip dip 



and 



/ |i/;(x)|^dx = 1. 



dxi ' ' dXn 

Here log = log^. Actually, there is a whole family of log-Sobolev inequalities, and (|85p repre- 
sents the tightest of these. The original form of the log-Sobolev inequality as introduced by 
Gross in [33] is 



2 

where 



|0(x)|^log|0(x)|Vx)dx< / ||V0(x)fp(x)dx+|10|li2(H.,rtlog 



1l2(R",p) 



(86) 



\(t>{^)\ P(x)dx. 



Here p(x) = p(x, 0) = (27r)~"'^^ exp(— ||x||^/2) is the solution to the heat equation on 
evaluated at i = 1. 

Several different variations exist. For example, by rescaling, it is possible to rewrite 
with p{yL, t) in place of p(x) by introducing a multiplicative factor of t in the first term on the 
right hand side of the equation. Or, by letting (t>{x) — p~2(x.)-ili(x./a) for some scaling factor 
a > 0, substituting into (|86}, and integrating by parts then gives [50] 



iV'(x)riof 



l^(x)P 

llV'lli 



rfx + n(l + loga)||'0||2 < 



||VV'(x)f rfx 



where 



11^112 = 



/ iV'(x)r 



dx and |lVi/'(x)f = ViP{x) ■ VV'(x). 



30 



This, together with an optimization over a gives (|85p . 

Gross subsequently extended (|86|l to Lie groups [3J as 

/ {\(p{g)\^\og\4>{g)\] p{g,t)dg <CG{t) f ||(X»(3)f + ll-^lli^tcp,) l«g II<^II'^(g,p.) 

JG Jg 

(87) 

where p{g,t) is the solution to the diffusion equation in (|83p with /li = 0, Dij = Jij, initial 
condition p{g, 0) = 5(,g), and 

X.''(l)=[Xl(l>,...,Xl,(pf and ||<?!'||L2(G.pt) = / \(t>(g)f p{g,t)dg. 

Jg 

In (|87|) the scalar function CG{t) depends on the particular group. For G — (R",+) we have 
CR"(f) = t, and likewise Cso(n)(i) = i- 

In analogy with the way that (|85p evolved from (|86p , a descendent of (|87p for noncompact 
unimodular Lie groups is [D [7l E] 

JV'(5)I log|V'(5)l dg < - log 

The only difference is that, to the author's knowledge, the sharp factor Cg in this expression is 
not known for most Lie groups. The information-theoretic interpretation of these inequalities 
is provided in the following subsection. 



2Cg 
Tven 



W^^fdg 



6.2 Information-Theoretic Inequalities 



For our purposes the form in (|85|) will be most useful. It is interesting to note in passing that 
Beckner has extended this inequality to the case where the domain, rather than being R", is 
the hyperbolic space = 5*^(2, R)/SO(2) and the Heisenberg groups H{n), including H(l) 
[71 18] . Our goal here is to provide an information-theoretic interpretation of the inequalities 
from the previous section. 



Theorem 6.1: Entropy powers and Fisher information are related as 



[A^(/)]"' < -tr(F) where iV(/) = ^ exp 
n 2ne 



-s{f) 

n 



(89) 



Proof: We begin by proving (|89|l for G — (R",-f). Making the simple substitution /(x) 
|V'(x)p into (|85p and requiring that /(x) be a pdf gives 



/(x)log/(x)dx < - log 



2nen 



f 



||V/fdx 



exp 



'-s{f) 



tr(£) 
27ren 



[N{f)r 



< -tr(F). 

n 



(90) 



Here S{f) is the Boltzmann-Shannon entropy of / and F is the Fisher information matrix. As 
is customary in information theory, the entropy power can be defined as A'^(/) in (|89p with 
Cg ~ 1- Then the log-Sobolev inequality in the form in (|90|) is written as 



For the more general case, starting with (|9ip and letting f{g) = |V'(5)| gives 



fia) log f{9)dg < - log 



Cg 



2nen Jq f 



-^<flog 



Cg 



27ren 



tr{F) 



(91) 



The rest is the same as for the case of R" . 

Starting with Gross's original form of log-Sobolev inequalities involving the heat kernel, 
the following information-theoretic inequality results: 
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Theorem 6.2: The KuUback-Leibler divergence and Fisher-Information distance of any arbi- 
trary pdf and the heat kernel are related as 



DKLif\\pt)<^Dpj{f\\p,) 



where in general given fi{g) and f2{g), 



1 - 1 - 

-X/l - -X/2 

/l 72 



fidg. 



(92) 



(93) 



Proof: Starting with ([87|, let i}{g,t) = [p{g,tyf^[f{g)]^ where f{g) is a pdf. Then 

/ \^{g,t)\'p{9,i)dg= [ fig)dg = l 
Jg Jg 

and so log ||0||^2((3 = 0, and we have 

i jj{g)\ogj^^dg< ljX{[pi9,t)]-Hf{g)]^)fp{g,t)dg. 
By using the chain rule and product rule for differentiation, 

M[p{g,t)rHf{g}]h = ^r^if - \ppt^y^pt. 

Substititution into the right hand side of (|87|) then gives (|92[) . 

In the functional analysis community from which log-Sobolev inequalities emerged it is 
rarely, if ever, stated in these terms. One exception is the work of Carlen [T7], which addresses 
Theorem 6.1 for the case of G = R". Moreover, the author has not found analogs of (|90|) in 
the context of Lie groups in the literature. 



7 The Entropy-Power Inequality (or Lack Thereof) 

One of the fundamental inequalities of information theory is the entropy power inequality 

iV(/l*/2) >iV(/i)+iV(/2) 

for any pdfs /i and /2 on R" with N{fi) defined as in (|89p for Ck" = 1. This was first stated 
by Shannon together with a verification of the necessary conditions for it to be true. This 
was followed up with proofs of sufficiency by Stam and Blachman [121 166j . Without going into 
too many details, the key technical points of their proofs require two properties. First, 

/i * pti * h * Pt2 ~ .h * h * Pti * Pt2 

(which is not a problem in R" since convolution is commutative). Second, they also use a 
scaling argument requiring that any pdf /(x) that is scaled as /s(x) = s ■ f{s ■ x) will become 
the Dirac delta function as s — > 0. That is not to say that these two properties are essential 
to proving the entropy power inequality, but rather only that they are the properties that are 
used in the most familiar proofs. 

However, there is somewhat of a conundrum because for compact Lie groups, the heat 
kernel pt{g) is a class function, and therefore satisfies the first condition. However, there is no 
natural way to rescale on a compact Lie group (not even on the circle group, SO{2)). And 
in fact, it is easy to see that on compact Lie groups the entropy power inequality does not 
hold. For example, the limiting distribution on a compact Lie group is poo = 1 with entropy 
S{poo) ~ 0, and entropy power N{pao) ~ 1. Since poo * f ~ Poo for any pdf, /, we get 
HPoo * /) = 1 ^ 1 + N{f) since N{f) > always. 
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On the other hand, it is possible for some groups to introduce a concept of scaling. For 
example, it is possible to do this in the Heisenberg group, roughly speaking, because all coor- 
dinate directions extend to infinity. Groups that admit a scaling property have been studied 
extensively [31]. However, whether the heat equations on such groups yield solutions that 
are class functions then becomes an issue. Regardless, for the groups of primary interest in 
engineering applications, i.e., the rotation and rigid-body motion groups, the possibilities for 
an entropy power inequality appear to be pretty slim. 

8 Conclusions 

By collecting and reinterpreting results relating to the study of diffusion processes, harmonic 
analysis, and log-Sobolev inequalities on Lie groups, and merging these results with new def- 
initions of covariance and Fisher information, many inequalities of information theory were 
extended here to the context of probability densities on unimodular Lie groups. In addition, 
the natural decomposition of groups into cosets, double cosets, and the nesting of subgroups 
provides some inequalities that result from the KuUback-Leibler divergence of probability den- 
sities on Lie groups. Some special inequalities related to finite groups were also provided. 

While the emphasis of this paper was on the discovery of fundamental inequalities and 
the introduction of Lie group concepts to the information theory audience, the motivation for 
this study originated with applications in robotics and other areas. Though these applications 
were not explored here, references to the literature pertaining to robot motion and image 
reconstruction were provided. 
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